{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Checking if a pair of stocks is cointegrated"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/anaconda3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.\n",
      "  from pandas.core import datetools\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from statsmodels.tsa.stattools import adfuller\n",
    "import matplotlib.pyplot as plt\n",
    "import quiz_tests\n",
    "\n",
    "# Set plotting options\n",
    "%matplotlib inline\n",
    "plt.rc('figure', figsize=(16, 9))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fix random generator so it's easier to reproduce results\n",
    "np.random.seed(2018)\n",
    "# use returns to create a price series\n",
    "drift = 100\n",
    "r1 = np.random.normal(0, 1, 1000) \n",
    "s1 = pd.Series(np.cumsum(r1), name='s1') + drift\n",
    "\n",
    "#make second series\n",
    "offset = 10\n",
    "noise = np.random.normal(0, 1, 1000)\n",
    "s2 = s1 + offset + noise\n",
    "s2.name = 's2'\n",
    "\n",
    "## hedge ratio\n",
    "lr = LinearRegression()\n",
    "lr.fit(s1.values.reshape(-1,1),s2.values.reshape(-1,1))\n",
    "hedge_ratio = lr.coef_[0][0]\n",
    "\n",
    "#spread\n",
    "spread = s2 - s1 * hedge_ratio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## Question\n",
    " Do you think we'll need the intercept when calculating the spread?  Why or why not?\n",
    " \n",
    "Since the intercept is a constant, it's not necesary to include it in the spread, since it just shifts the spread up by a constant.  We use the spread to check when it deviates from its historical average, so what matters going foward is how the spread differs from this average."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz\n",
    "### Check if spread is stationary using Augmented Dickey Fuller Test\n",
    "\n",
    "The [adfuller](http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html) function is part of the statsmodel library.\n",
    "```\n",
    "adfuller(x, maxlag=None, regression='c', autolag='AIC', store=False, regresults=False)[source]\n",
    "\n",
    "adf (float) – Test statistic\n",
    "pvalue (float) – p-value\n",
    "...\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pvalue 0.0000\n",
      "pvalue is <= 0.05, assume spread is stationary\n",
      "Tests Passed\n"
     ]
    }
   ],
   "source": [
    "def is_spread_stationary(spread, p_level=0.05):\n",
    "    \"\"\"\n",
    "    spread: obtained from linear combination of two series with a hedge ratio\n",
    "    \n",
    "    p_level: level of significance required to reject null hypothesis of non-stationarity\n",
    "    \n",
    "    returns:\n",
    "        True if spread can be considered stationary\n",
    "        False otherwise\n",
    "    \"\"\"\n",
    "    adf_result = adfuller(spread)\n",
    "    pvalue = adf_result[1]\n",
    "    print(f\"pvalue {pvalue:.4f}\")\n",
    "    if pvalue <= p_level:\n",
    "        print(f\"pvalue is <= {p_level}, assume spread is stationary\")\n",
    "        return True\n",
    "    else:\n",
    "        print(f\"pvalue is > {p_level}, assume spread is not stationary\")\n",
    "        return False\n",
    "    \n",
    "quiz_tests.test_is_spread_stationary(is_spread_stationary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pvalue 0.0000\n",
      "pvalue is <= 0.05, assume spread is stationary\n",
      "Are the two series candidates for pairs trading? True\n"
     ]
    }
   ],
   "source": [
    "# Try out your function\n",
    "print(f\"Are the two series candidates for pairs trading? {is_spread_stationary(spread)}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
